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ABSTRACT 

This paper presents a self-organizing protocol for dynamic 
(unstructured P2P) overlay networks, which allows to react 
to the variability of node arrivals and departures. Through 
local interactions, the protocol avoids that the departure of 
nodes causes a partitioning of the overlay. We show that it is 
sufficient to have knowledge about 1st and 2nd neighbours, 
plus a simple interaction P2P protocol, to make unstruc- 
tured networks resilient to node faults. A simulation assess- 
ment over different kinds of overlay networks demonstrates 
the viability of the proposal. 
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C.4 [Performance of Systems]: Fault tolerance 
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1. INTRODUCTION 

Peer-to-Peer (P2P) networks represent a specific use case 
that can be analyzed through complex network theory [H] [7] • 
Fault tolerance of P2P networks is extremely important and 
a plethora of studies examines the resilience dynamics in case 
of churn, i.e. high rate of node arrivals and departures |12ll4]. 
The main motivation is to understand whether algorithms 
locally executed by peers guarantee network resilience and 
allow preserving the characteristics of the P2P overlay, such 
as network connectivity and routing performance. Actu- 
ally, most of these studies focus on structured P2P networks 
[9l [1^ . Structured P2P overlay networks are those where 
links among nodes are created based on the contents hold 
by nodes. Examples of structured architectures are tree- 
based and hierarchical content-dependent structures, as well 
as P2P systems built using Distributed Hash Tables (DHTs) 
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[21 [8]. Due to the need of maintaining such structure, re- 
configuration algorithms are usually executed to preserve 
the general characteristics of the net. 

Conversely, in an unstructured P2P overlay network, links 
among nodes are established arbitrarily, i.e. they do not de- 
pend on the contents being disseminated [5l[7l[l0]. These 
solutions are particularly simple to build. Thus, unstruc- 
tured overlays may be useful in very dynamic contexts. Due 
to this "freedom" in the creation and management of the 
overlay, an option is to avoid the definition of protocols in 
charge of reacting to node departures, leaving the overlay 
management to the attachment process only 16,. This com- 
monly guarantees that certain network properties are pre- 
served on a steady state [6]. However, in case of multiple 
node departures, some of the properties of these overlays 
may disappear [11] . For instance, the overlay might get par- 
titioned upon failure of links connecting different clusters. 
Alternatively, some important links might be lost that were 
playing a main role to keep a limited network diameter; as 
an example, in small worlds there are links among distant 
nodes, that strongly reduce the average shortest path length. 
While the P2P network is unstructured, it has a topology 
that provides certain characteristics. These should be main- 
tained, at least up to a certain extent, in order to provide 
some guarantees and the ability of the network to spread 
contents. 

The aim of this work is to study a local mechanism, which 
is in charge of reacting to node churn without the need to 
introduce a structure on the P2P overlay. We are looking 
for a decentralized algorithm that enables nodes to react 
to network changes, so that the general characteristics of 
the network topology are preserved. The mechanism re- 
quires that each node n has knowledge of its 1st neighbours 
(i.e. friend nodes, directly connected to n) and 2nd neigh- 
bours (i.e. friends of n's friends) only. Upon a neighbour de- 
parture, each node is able to understand if sonre 2nd neigh- 
bour is no more reachable. If this is the case, the node cre- 
ates a link with it; a very simple communication protocol is 
employed to coordinate peers and avoid that multiple neigh- 
bours create a novel link with the same node. This way, no 
multiple links are created that might alter the unstructured 
network topology. 

The approach is thought to preserve links that connect dif- 
ferent components of the network. It is usual to have some 
central node that routes an important part of nodes' mes- 
sages. In complex network theory, several measures have 
been introduced to characterize this phenomenon, e.g. be- 
tweenness centrality [TJ [14] . The calculation of these met- 



rics usually involves a full (or partially full) knowledge about 
the network. Conversely, the aim of this work is to preserve 
connectivity without such a global knowledge, despite the 
failure of some central node n, by replacing failed links with 
novel links among neighbours of n (which were not neigh- 
bours before the n failure). Some previous proposals dealt 
with similar problems, e.g [131I17| . 

It is thus interesting to observe how the protocol per- 
forms over networks with different connected clusters. A 
simulation assessment is presented that studies the protocol 
over uniform networks, where links are created by randomly 
choosing nodes as neighbours, and clustered networks. Re- 
sults demonstrate that the presented approach preserves net- 
work connectivity and resilience, despite node churn. 

The remainder of this paper is organized as follows. Sec- 
tion [2] presents the P2P protocol. Section [3] describes the 
simulation environment and discusses the obtained results. 
Finally, Section [4] provides some concluding remarks. 

2. THE PROTOCOL 

Every node n maintains the list of its neighbours (1st 
neighbours, II,i), and the neighbours of its neighbours (2nd 
neighbours, 11^). The degree of n is the amount of 1st neigh- 
bours, i.e. |n„|. Every time the list of 1st neighbours n„ 
changes, due to some node arrival or departure, n informs its 
other 1st neighbours of this update. With li^,^ = 11™, — n„, 
we identify the n's 2nd neighbours which can be reached 
through m. Hence, 11^ = Ufegn„n^n,. 

The aim of the protocol is to avoid that a node failure 
causes a network partitioning and does not increase exces- 
sively the distance among nodes in terms of hops. It is well 
known that in networks, certain links can play a main role 
to keep a limited network diameter. For instance, small 
worlds are characterized by links among distant nodes, that 
strongly reduce the average shortest path length. With this 
in view, when a node / fails, each neighbour n £ 11/, that 
was employing / to reach some of its 2nd neighbours H^if , 
reconfigures its links so that these nodes still remain in its 
2nd neighbourhood, i.e. n checks that nodes in H^ij- remain 
in its 2nd neighbourhood or it creates links with them. Fig- 
ure [T] shows a related example, that focuses on links that 
should be created at n, after a neighbour failure, so that its 
2nd neighborhood remains unaltered. Upon failure of the 
central node, novel links are created between n and a node 
of each cluster. This has the two following advantages: i) the 
network does not get disconnected, and ii) a minimum num- 
ber of links is created. (Note that the example focuses on n; 
however, other links might be necessary for other nodes.) 
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According to the scheme, when a node / G !!„ fails, 
Vp € 11/ there are three possible cases: 



• p a Tin ■ n and p are neighbours; there is nothing to 
do; 

• p ^ Tin, but p G nj; since p G H^j^ for some q G 
Tln,q 7^ f' p is still a 2nd neighbour of n; there is 
nothing to do; 

• p ^ Tln,p ^ n^ : p is no more a 2nd neighbour of n; 
n takes part to the distributed procedure to create a 
link with p (explained below) . 

In essence, links are created among nodes in different clus- 
ters which were connected through / only. 

The distributed protocol is reported in Algorithms ! H2l and 
works as follows. As mentioned, the procedure is activated 
at node n, after the failure of one of its neighbours, / G n„. 
All nodes p G 11/ A p ^ n„ A p ^ TT^ are considered. The 
protocol is made of two distinct parts: active (Algorithm [T|) 
and passive (Algorithm [2)) behavior. The node n executes 
the active behavior to create a link with some node only 
while its actual degree does not surpass a given threshold 
value (Algorithm [l] line [2} . This is a practical control to 
avoid that the degree of a node grows out of control and 
that the network topology changes radically. (Note that the 
threshold must not be too low, otherwise it would contrast 
the creation of additional links, and this might generate net- 
work partitions.) Then, n waits a random time (line|3]). This 
is a typical contention-based approach, used to diminish the 
probability that multiple nodes of the same cluster attempt 
to create a novel link with p at the same time [15|. Then, 
n randomly selects a node p from the list of nodes to which 
the cluster needs to be linked, and sends it a link creation 
request (lines HHS]). These steps are repeated until the list 
of lost 2nd neighbours becomes empty (line [2} . 

Algorithm 1 Failure management of / at node n: Active 

behavior 

1: p^{p^nf\p^Tln,p^Ul} 



while (P / 0) A (jn„| < thresholdDegree) do 

wait random time 

p •<— extract random node from P 

send link creation request to p 
end while 



Figure 1: Reconfiguration at n after a node failure 



The node reacts to received messages as follows (Algo- 
rithm [2|. Upon a message reception, answering a previous 
link creation request from n to p (lines [2HZ|, if the answer 
is positive, then n informs all its neighbours about its novel 
neighbour and creates the link with p. Upon reception at 
n of a message from a neighbour q, about a novel link cre- 
ation between q and a node m (line[8]), n removes m from P, 
i.e. the list of nodes that the cluster needs to be linked with 
(line[T]). Moreover, n adds in its cache information about 
m, so that n knows in the future that m is a 2nd neighbour, 
reachable through q, i.e. m G H^i (line|9]). Upon reception 
of a message from a node p asking n to become neighbours, 
n accepts the request only if p is not a 1st or 2nd neighbour 
of n (it is possible that some of its neighbours just created 
a link with p, lines llOHlSp . In this case, it sends a positive 



answer to p and informs all the n neighbours of this novel 
link (n,p); finally, it adds p to its neighbours. 

Algorithm 2 Failure management of / at node n: Passive 
behavior 

Require: message from p answering a link creation request 
2 
3 

4 
5 
6 

7: 



if answer is OK then 

for all m £ n„ do 

send message to ra for link creation {n,p) 

end for 

add p to Hn 
end if 



Require: message from q G n„, for link creation 
{q,m),m € P 
8: extract m from P 
9: add m to XI^i^ 

Require: message from p with a link creation request 



10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 



if p G n^ then 

send refuse message 
else 

send accept message 

for all m G Iln do 

send message to m for link creation {n,p) 

end for 

add p to n„ 
end if 



The protocol avoids that multiple link creation requests 
are generated from nodes of the same cluster towards a given 
lost 2nd neighbour. This reduces the amount of messages 
sent in the network and avoids that the network topology is 
altered significantly. 

3. PERFORMANCE EVALUATION 

To assess the proposed protocol, we simulated the protocol 
over different network topologies. In the following, we de- 
scribe the simulation settings and the obtained results when 
running the protocol over uniform networks and clustered 
networks. Actually, during our experimental evaluation we 
employed also random graphs, obtaining results comparable 
to those obtained for uniform networks (hence, for the sake 
of brevity we do not show them in the paper). 

3.1 Simulation 

The simulator was written in GNU Octave. As mentioned, 
two types of unstructured networks were considered: uni- 
form networks and clustered networks. Uniform networks 
are those where all nodes start with the same degree. Then, 
due to node failures and arrivals (and the reconfiguration im- 
posed by the P2P protocol), the node degree might change. 
We varied the initial degree of nodes. During our tests, we 
set the threshold value for creating novel links in the pro- 
tocol (i.e. thresholdDegree in the algorithm) equal to the 
initial degree, i.e. the node runs the active behavior when 
its actual degree is below the initial degree. 

As a matter of fact, the P2P protocol is thought for those 
networks that have important links that connect different 
parts of the network; thus, it is interesting to observe how 



the protocol performs over nets composed of different con- 
nected clusters. In these simulations, network clusters were 
set to be of the same size. We set two different parameters 
to create the network. The first parameter is the probability 
7 of creating a link among nodes of the same cluster. Each 
node is linked to another node of the same cluster with a 
probability 7; hence, inside a cluster, nodes are organized 
as a classic random graph. As to inter-cluster links, the 
amount of links created between the two clusters was deter- 
mined based on a certain probability lj times the number of 
nodes in the clusters (i.e. each node has a probability lj of 
having a link with each external cluster). 

Upon a node failure, all its links with other nodes are re- 
moved. The failed node is randomly selected among those 
active at that simulation step. Then, the node passes to 
an inactive state; it can be selected further on to simulate 
a novel node arrival. Thus, a node arrival is simulated by 
changing the state of a randomly selected inactive node to 
pass to the active state. This event triggers the creation 
of novel links with other randomly selected nodes. Differ- 
ent joining procedures were executed, depending on the net- 
work topology under investigation. As concerns uniform net- 
works, a random set of neighbours was selected, whose size 
was equal to the fixed degree that characterized the starting 
network topology. As concerns clustered networks, the node 
was associated to a cluster, and links with nodes in that 
cluster were randomly created based on the 7 probability, 
as in a classic random graph. Then, for each other cluster, 
the node creates, with probability u, a link with a random 
node of that cluster. 

3.2 Evolution with a stable network size 



Average size of main component 
Uniform net - threshold degree = initial degree 
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Figure 2: Main component size — uniform network 

In this section we study the performance of the P2P pro- 
tocol when the network is almost stable, i.e. when the node 
failure rate is equal to the node arrival rate. The network 
evolves with nodes leaving and joining the network, but the 
net size does not change during the system evolution. We 
present averages of obtained results from a corpus of 20 sim- 
ulations for the same scenario. During the simulations we 
removed the transient from the analyzed logs. 

First, we consider the case of uniform networks. During 
the simulations we varied several parameters such as network 
size and initial fixed node degree. Obtained results were 
similar in all the simulations, leading always to the same 
conclusions (hence, we show a limited number of scenarios). 
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Figure 3: Average amount of 1st and 2nd neighbours 
— uniform network 



Figure [2] shows the average fraction of nodes that are in 
the main component when the P2P protocol is executed (ON 
mode) or not (OFF mode). In this case, a network of 200 
nodes is considered, and the initial fixed degree was var- 
ied. As shown in the chart, the majority of nodes remain 
in the main component during the network evolution. This 
is perfectly reasonable since only a single node failure and 
node arrival was simulated at each simulation step. Hence, a 
uniform network cannot be subjected to important network 
partitions in this particular scenario. However, it is inter- 
esting to observe that while some nodes exit from the main 
component in the OFF mode, all the active nodes remain in 
the main component in the ON mode. 

Figure [3] shows the average number of 1st and 2nd neigh- 
bours of each node. It is possible to notice that, as expected, 
the higher the initial degree the higher the number of (1st 
and 2nd) neighbours. Moreover, the ON mode provides a 
higher amount of neighbours w.r.t. OFF mode. This corre- 
sponds to a higher connectivity for network nodes. 

When we consider a clustered network, we expect a similar 
behavior, with the exception that if nodes fail that have links 
which connect different clusters, then the network might dis- 
connect. In these simulations, a clustered network was ran- 
domly created (based on 7, cj); then, a certain amount of 
nodes was randomly selected, and these nodes were forced 
to fail (in the following we show results when 5 nodes were 
forced to fail at the beginning of the simulation). Then, at 
each simulation step a single node was randomly selected to 
fail and a novel node arrival was triggered. 

First, it is interesting to observe that the ON mode does 
not alter the topology of the network. Just as an example, 
FigureUreports a graphical representation of two snapshots 
of a network composed of 200 nodes, at the beginning of the 
simulation and after 200 simulation steps. In this case, a 
particularly dense network was considered. It is possible to 
observe that the structure of the network is almost the same. 
Note that, due to the software employed to create these pic- 
torial representations, the position of nodes in the network 
is not fixed, i.e. a node that is in a certain position in the 
first snapshot is not positioned on the same coordinates in 
the other snapshot. Hence, only the general structure of the 
network should be considered here. The nodes without links 
in the second picture are those currently inactive (failed). 

Figures [S] and [S] show the average size of the main com- 
ponent and the average amount of 1st and 2nd neighbours. 



respectively, obtained in several configuration settings. In 
particular, in the two charts we vary on the x-axis the value 
of 7, i.e. the probability of creating links among nodes of 
the same cluster, while leaving uu unaltered. In Figure (5] it 
is possible to appreciate that the ON mode guarantees that 
all nodes remain in the main component, while the OFF 
mode is not able to do it. However, 7 does not infiuence 
significantly the final result. Figure [6] states that the ON 
mode guarantees a higher average amount of (1st and 2nd) 
neighbours, with respect to the OFF mode. The higher 7 
the higher the amount of neighbours. 

Figure [7| shows the average size of the main component in 
clustered networks, obtained when varying cu. It is possible 
to observe that the size of the main component grows with a;. 
In the ON mode, all nodes are in the main component, while 
a non-negligible portion of nodes remains outside the main 
component in the OFF mode. Figure [S] shows the amount 
of 1st and 2nd neighbours; the ON mode provides a higher 
amount of neighbours with respect to the OFF mode. More- 
over, the higher lj the higher the amount of 2nd neighbours 
and the (slightly) higher the amount of 1st neighbours. 

Average size of main component 
w = 0.5 
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Figure 5: Average size of main component; 7 value 
varied — clustered network; 200 nodes, 4 clusters (on 
average, there are 4a; inter-cluster links) 
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Figure 6: Average amount of neighbours; 7 value 
varied — clustered network; 200 nodes, 4 clusters (on 
average, there are 4cj inter-cluster links) 



3.3 Resilience to node faults 

It is interesting to understand how the protocol behaves 
when the network experiences several node faults. Thus, 
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Figure 4: Two snapshots of the clustered network structure during evolution; node failure rate equal to the 
node join rate; 200 nodes, 4 clusters, 7 — 0.5, uj = 0.2 (hence on average there are 4aj = 10 inter-cluster links) 
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Figure 7: Average size of main component; uu value 
varied — clustered network; 200 nodes, 4 clusters (on 
average, there are 4io inter-cluster links) 



Figure 8: Average amount of neighbours; lj value 
varied — clustered network; 200 nodes, 4 clusters (on 
average, there are 4ijj inter-cluster links) 



a simulation was performed where only failure events were 
generated. In particular, the network started with a cer- 
tain topology; then nodes progressively fail until no nodes 
remained active. Of course, this is not a realistic scenario. 
However, this represents a worst case to assess resilience to 
faults. Hence, it allows to understand if the failure manage- 
ment procedure explained in the previous section is able to 
maintain the network connected, despite the links removals 
due to node faults. 

Figure [5] shows the amount of nodes that remain in the 
main component during the evolution, while nodes continu- 
ously fail. In this case, we consider uniform networks com- 
posed of 200 nodes, starting with a fixed degree equal to 
5. This figure shows the evolution of a specific network. 
We repeated the same experiment multiple times, varying 
the network size, the initial nodes' degree, and the seed for 
random generation, obtaining comparable results. It is pos- 
sible to see that, in the OFF mode, at a certain point of 
the simulation the network gets disconnected and the per- 
centage of active nodes in the main components decreases 
(at the end, the percentage of nodes in the main component 
might increase, meaning that isolated nodes have failed). 
Conversely, in the ON mode, active nodes remain connected 
in the same, single component. This is confirmed by look- 
ing at Figure 1101 which shows the amount of nodes that 
remain isolated. While the percentage of isolated nodes in- 
creases in the OFF mode, no nodes remain isolated in the 
ON mode. It is worth adding that during the simulation of 
the OFF mode, we observed that some nodes formed some 
small components, which are not shown in the two figures. 

Nodes in the main component - net size = 200 
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Figure 9: Main component size during a simulation; 
progressive node failures ~ uniform network 

Figures[TT]-[T2]show the amount of nodes in the main com- 
ponent size and the amount of isolated nodes, respectively, 
when we simulate progressive node failures in clustered net- 
works (they represent the results for a single simulation, 
which are in perfect accordance with all other runs for the 
same scenario). We show results for clustered networks of 
200 nodes, with 7 = 0.1, w = 0.1. From these results, it is 
clear that (random) node faults have a strong impact on net- 
work connectivity. In fact, in the OFF mode the network 
gets disconnected and the number of active nodes in the 
main component progressively decreases with node faults. 
During the simulation we noticed the formation of minor 
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Figure 10: Amount of isolated nodes during a simu- 
lation; progressive node failures — uniform network 



components (not shown in these figures). Moreover, an in- 
creasing amount of active nodes gets isolated from the rest 
of the network, as shown in Figure 1121 Conversely, the ON 
mode allows the network to remain connected till the end 
of the simulation, i.e. until all nodes failed. This is an im- 
portant result that confirms that the use of a simple local 
strategy guarantees the resilience of the unstructured net- 
work, whatever its topology. 







Nodes in the main component size 


■ Net size = 200 




0.9 

0.8 


X 


\ 


ON 
OFF 


- 


0.7 




■^ 




- 


0.6 




"%, 




- 


O.S 








- 


0.4 




-^ 




- 


OJ 






^ 


- 


0.2 






■-■ 


- 



20 40 60 80 100 120 140 160 180 200 

Failed nodes 

Figure 11: Main component size during a simula- 
tion; progressive node failures — clustered network; 

7 = 0.1, cj = 0.1; 200 nodes, 4 clusters (on average, 
there are Auj inter-cluster links) 



4. CONCLUSIONS 

Outcomes confirm that it is possible to guarantee resilience 
to node failures in unstructured P2P overlay networks. The 
use of simple and local protocols avoid network disconnec- 
tions. To demonstrate this, a very simple approach has been 
studied which requires knowledge at nodes of 1st and 2nd 
neighbours. The protocol avoids that a too high amount of 
links are created among nodes, to react to a single node de- 
parture, thus preserving the network topology. In essence, 
the presented approach guarantees that the overlay network 
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Figure 12: Amount of isolated nodes during a simu- 
lation; progressive node failures — clustered network; 

7 = 0.1, w — 0.1; 200 nodes, 4 clusters (on average, 
there are Icj inter-cluster links) 

remains connected, without the need (and the related costs) 
to add a structure to the overlay. 
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